THIS  REPORT  HAS  BEEN  DELIMITED 
AND  CLEARED  FOR  PUBLIC  RELEASE 
UNDER  DOD  DIRECTIVE  5200,20  AND 
NO  RESTRICTIONS  ARE  IMPOSED  UPON 
ITS  USE  AND  DISCLOSURE, 

DISTRIBUTION  STATEfCNT  A 

APPROVED  FOR  PUBLIC  RELEASE; 
DISTRIBUTION  UNLIMITED, 


1 supoly,  you  are  rtquesled  to  return  this  copy  IT  HAS  SERVED 

at  it  may  be  made  ivailable  to  other  requesters.  Your  cooperation 


■T 

■ 'Sc 


NMEt'TT  OR  OTHER  DEAV»'m*lS,  SPECIFICATIONS  OR  OTHER  DATA 
E OTHER  THAN  CONNECTION  \ATTH  A DEFDHTELY  RELATED 
cIVi:.;  the  U-  S.  C*0VERNMKNT  THER.EBY  INCURS 

OR  ADc  ODLIG/nON  ‘ATtATSOEVER;  AND  THE  FACT  THAT  THE 
t,  FO.cNil  LAI  x D,  FUHKLSHED,  OR  IN  Ain  WAY  SUPPLIED  x HE 
•KAT/OKS,  Of.  TIKER  DATA  .IS  NOT  TO  BE  REGARDED  BY 
iiW  SF  AS  VH  A>Y  MANNER  LICENSING  THE  HOLDER  OR  ANY  OTHER 
ON.  OT  CONV'YING  any  RIGHTS  OR  PERMISSION  TO  MANUFACTXJHf 
tN’.'Er  INFEN*"  ON  ’THAT  •^^AV  iN  ANY  WAY  BE  RELATED  THERETO, 


Reproduced  by 

f'  jMENI  SERVICE  CENTER 

^OTT  BUILi  iNG.  DAYTQN,  2.  GHiO 


DOCUMENTATION 


/!  D 

f ’ • 


Q 


r-  1 


9521  CONNECTiCUT  AVcNut,  ^ 
WASHINGTON  8,  D 
COLUMBIA  5-45 


ERRATA 

Technical  Report  No.  6.  ’ The  Mechanization 
of  Coordirrats  Indexing**  prepared  under  Con- 
tract Nonr  1305(00)  for  the  Office  of  Naval 
Research,  September  1954: 

. . . attached  sheet  for  p.  15  and  16 

. . . ^ge  8,  paragraph  2 should  r^d  **dedicated 
positions  1/16*  x 1/15’*,**  rather  than  1/16* 


4.  W. 
. C. 
i 7 7 


. . . eliminate  page  19 


descriptors.  In  fact  a question  may  be  asked  under  one 
term;  and  it  is  our  e3q>erience  that  the  great  bulk  Oi 
questions  involve  2 or  3 terms  and  that  additional 
terms  are  added  only  if  the  amaunt  of  information  de> 
livered  by  a 2 or  3 termed  quetstion  proves  to  be  too 
voluminous.  In  other  v^ords,  actual  reference  exper- 
ience and  not  tnatbematical  ccnsiderations  alone  must 
influence  the  design  of  a system.  A system  of  coor- 
dinate indexing  which  delivers  too  high  a percentage  of 
"noise"  for  one,  two  or  three  term  questions  is  simply 
not  adequate  for  reference  use. 

In  the  Zatocoding  system  random  numbers  corres- 
pond to  the  language  elements  in  the  indexing  system 
we  have  described.  Every  tern?  or  subject  is  repre- 
sented by  a random  group  cf  fcui  numbers  chosen  from 
a field  of  forty  numbers,  e.  g. : 


air 

Q 

12 

17 

23 

ducts 

6 

17 

24 

29 

icing 

U 

12 

23 

27 

We  assume  that  for  any  item  indexed,  no  more  than 
1/2  of  the  coding  area  is  used.  (This  figure  corresp%onds 
to  our  use  of  1/3  of  the  dedicated  positions  in  our  mach- 
ine development. ) 

The  formula  (approx. ) given  by  Calvin  Mooers  for 
the  dropping  fraction  in  any  search  is  1/2^  where  n 
equals  the  numbf^r  of  notches  or  holes  used  in  a search. 
For  example,  if  we  search  a collection  for  everything 
on  Air,  n - 4 (3,.  12, 17, 23);  for  everything  on  Air  Ducts, 
n = 7 (3, 12, 17, 23, 6, 24, 29);  for  everything  on  .Mr  Ducts 
Icing^  n r 9 (3,12,17,23,  6,24,29,11,27). 


n 

4 

5 

cl  7 
1 

f.'  I9  j ..  H 

i 

1/2" 

1/16 

1/32 

1,^64;  1/128 

1 

l/25sjl/513jl/l024|i/2048 

1/4006  j 

It  is  only  necessary  to  multipiy  the  drojjplnfi  fraction  by 
the  size  of  any  collection  to  determine  the  approximate 
average  number  of  false  drops  per  search.  This  table  show 
the  size  of  the  dropping  fraction  for  various  sizes  of  n. 


n 

1.000 

! lo.  000 , 

100. 000 



250,000 

4 

1./16 

62.  5 

82b.  0 

S,  250 

15,  625 

5 

l/'32 

31.25 

312.  5 

1 3,225 

7,813 

6 

1/64 

15.6.3 

?56.3 

1 i , 563 

3,908 

fV 

1 

1/128 

7.82 

78.2 

1 782 

1,955 

6 

1/256 

m rks  ! 

C , 0 1 

f :i»  t 

33i 

1 U78 

9 

1/512 

1 . j 

i 10.  to 

196 

400 

10 

1/1024 

1 n .-jj  ’ ' 

■ 5- 

w'jt.  jj 

U 

1/2048 

! 0.  ^5*  } ■ i 

j 49  1 

t 123 

12 

1/4096  j 1.  ] 

2.5 1 

i ?lj 

C3 

If,  as  Is  the  cas4,  t.  { average  reference  question 
involves  the  coordina“'w  • two  txi  lerma.  the 

average  number  of  lalse  t ^ *ps  in  a Zatocoding  system 
containing  100, 000  ilems  i II  ranfio  from  3,  i25  items 
(when  n - 5,  the  minlrouiti  1 Jtnhcr  of  notches  used  for 
2 terms)  to  ^3  items  (wfic:  ; • 12,  the  maxir~.ini  number 

of  notches  which  can  be  us  for  3 Ic.' 

In  the  mechanized  sya  > m w«  >‘Vcloi-  o,  ; m a 
search  by  2 terms,  wc  lar*n>ng  i<  dropping  fr  olion 

Qf  1/320.  This  means  tt-.ai  J >r  al!  fre*  ctical  purposes  there 
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TECHNICAL  REPORT  NO.  P 
THE  MECHANIZATION  OF  COORDINATE  INDEXING 

In  the  summer  of  1952,  the  Armed  Services  Techni- 
cal Iriformatlon  Agency  awarded  a contract  to  Documenta- 
tion Incorporated  for  an  investigation  and  experimental  in- 
stallation of  a then  completely  novel  system  for  filing  and 
g€trio\'al  of  information,  the  Uniterm  System  of  Coordi- 
nate Indexing. 

This  manual  system  of  storage  and  reli  ieval  of  in- 
dexing information  as  conceived  by  DocumenLation  Lnco:  - 
porated  lias  since  been  developed  and  refined  in  a series 
of  test  programs  and  has  been  adopted  by  numerous  gov- 
ernment agencies  and  industrial  organizations. 

One  of  Ihe  reasons  this  system  has  found  such  ready 
acceptance  is  that  it  combines  high  efficiency  and  speci- 
iicity  of  information  retrieve?,  compactness  and  low  cost, 
and  most  important,  the  real  possibility  of  mechanization. 
Although,  for  a system  of  limile'^  size,  Lhe  manual  Uni- 
term System  of  Coordinate  Indexing  far  exceeds  any  pres- 
e.uLly  available  machine  system  in  efficiency,  *it  was  real- 
ized that  larger  systems  would  still  require  a form  of 
meciianization. 

In  the  early  part  of  1953,  certain  pre-existing  machine 
designs  for  data  retiuevai  were  combined  with,  the  logic  of 
the  Uniterm  System  of  Coordinate  Indexing,  resulting  in 
the  conception  of  the  indexing  machine  described  herein. 

^’’Effieiency”  defined  as  a combination  of  cost  and 
effectiveness  of  information  retrieval. 
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SliOA’'tly  alter  its  f:onr:eption,  this  system  came  to  the 
attention  of  the  Arnitd  Services  Technical  Information 
Agency,  which  requested  Documentation  Incorporated  to 
make  an  investigation  of  the  logic  of  this  machine  and  of 
some  of  the  proposed  constructiona.!  fen  tu res  to  deterznine 
its  suitability  to  Armed  Services  Technical  Information 
Agency's  particular  requirements.  This  ir7/estigation  has 
now  been  completed,  and  we  are  pleased  to  report  that 
this  system  of  instantaneous  infoririaiion  filing  and  retrie- 
val exceeds  existing  and  contemplated  machine  systems  in 
efficiency  to  such  an  extent  as  to  create  an  entirely  new 
dimeiision  in  cost  and  rctrle’/al  time 

Until  now,  systems  of  sequential  scanning  of  informa- 
tion such  as  the  I.B.  M.  and  Remington  Rand  systems, 
have  completely  monopolized  all  machine  systems  as  well 
as  all  tbiiiking  in  the  information  field.  While  such  mach- 
ines are  excellent  for  tabulations,  computations,  and,  in 
general,  bookkeeping  purposes  in  the  widest  sense  of  the 
word,  they  are  not  especially  suited  for  inJormation  stor- 
age and  retrieval. 

Only  a system  of  simultaneous  scanning  of  superim- 
posed data  storage  sheets,  corresponding  to  natural  lang- 
uage eieinents,  as  proposed,  will  provide  low  cost  instan- 
taneous  filing  as  well  as  retrieval  of  information.  It  is 
our  conviction  that  this  machine  will  be  a major  and  his- 
tory-making step  in  th  direction  of  adequate  information 
control. 

Coordinate  indexing  as  a generic  term  covers  all 
forms  of  indexing  in  which  the  retrieval  of  specific  items 
of  informaiion  involves  the  deter miiiaticn  of  the  logical 
product  of  a number  of  classes.  The  Uniterm  Sy  stem  2-s 
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n r'.perio'^  of  (Tiordinaie  indexinf!  is  a manual  method  of 
determining  ihe  logical  prockiei  of  two  or  more  classes 
through  the  device  of  "arithmetical”  coordination.  Tee 
di.scovery  of  a common  number  on  two  or  more  Uniterm 
cards  establishes  that  there  is  a class  which  is  the 
logical  product  of  the  classes  denoted  by  the  Uniterms 
and  that  the  class  has  members.  The  niembers  are,  of 
course,  the  documents  or  other  items  designated  by  the 
common  numbers. 

One  advantage  of  the  Uniterm  System’s  device  of 
"a  i-unmetical”  coordination  is  that  numbers  can  be  Vv’ritten 
on  the  card  as  the  num.bered  items  are  received  and  in- 
dexed. It  is  not  necessary  to  dedicate  in  advance  any  por- 
tion of  the  card  to  any  number.  There  is.  however,  an- 
other manual  method  of  Coordinate  Indexing,  known  as 
the  Batten  system  or  the  Peek-a-Eoo  system,  whic.ii  de- 
lei  mines  the  logical  product  by  the  coincidence  of  iuc;;- 
tically  numbered  spaces  rather  than  the  meie  identity 
of  num.bers.  In  the  Uniterm  Syst^^m,  even  though  some 
use  is  made  of  geometry  (the  arrangement  of  numbers  in 
ten  columns)  a number  at  the  head  of  a column  on  one 
card  iTtay  be  identical  v4th  a number  at  the  foot  of  a 
column  on  another  card.  But,  a.s  we  can  uOic  wii  Uxamit 
I in  the  Batten  system,  the  position  of  each  number  must 
be  preassigned  and  must  be  in  the  same  place  on  each 
caid.  This  preass  ignmerst  of  numbers  to  dedicated  posi- 
tions gives  the  Batten  system  one  important  and  obvious 
advantage  over  the  Uniterm  System,  in  that  coordination 
becomes  a simple  rapid  process  involving  as  many  items 
as  arc  desired;  whereas  in  the  Uniterm  System  numbers 
must  be  matched  for  any  two  terms  before  going  on  to  a 
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Batten  Card  fcj 
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llurd,  or  lourth  term,  clc.  On  the  other  hand,  the  use 
of  dedicated  positions  limits  drastically  the  capacity  of 
the  Batten  system  as  compared  v/ith  the  Uniterm  Sys- 
tem. As  will  be  seer,  from  Exhibit  I,  by  the  time  item 
101  enters  our  system,  we  are  beyond  the  capacity’  of  our 
Batten  cards  and  must  set  up  an  entire  new  set.  ./\nd 
chis  may  occur  willt  ordy  an  average  use  of  . 01%  or  less 
of  the  dedicated  positions  on  the  first  set. 

The  Office  of  Basic  li'.strumentation  of  the  National 
Bureau  of  standards  is  experimenting  v.'ilh  Batten  cards 
having  a capacity  of  18,  000  positions.  An  interesting 
and  impurLaut  account  of  this  experiment  hos  appeared 
recently,  but  this  account  does  not  reveal  any  informa- 
tion concerning  the  average  or  mean  density  pf  use  of  the 
dedicated  positions.  We,  ourseb^es..  lacking  the  e\^idence, 
which  we  hope  will  be  forthcoming  from  additional  reports 
on  the  Office  of  Basic  instruTnenta-tion  project,  ft>it  that 
for  large  rapidly  growing  files  of  material  tiie  lirrdtutions 
imposed  by  the  dedicated  positions  of  a Batten  system 
outweighed  the  apparent  advantages  of  rapid  coordination. 
These  limitations  can  be  expressed  in  ,.erms  of  certain 
theoretical  considerations  together  with  certain  evidence 
which  is  now  available.  We  know,  for  e.xample,  that  the 
indexing  of  18,  000  documents  will  involve  a vocabulary 
in  the  range  of  5,  000  terms.  We  also  know  that  ihe 
average  document  can  be  iiidexeci  u.sing  8 to  10  terms. 


^^^Documentation  in  Instrumentation,  National  Bureau  of 
Standards  Report  3276,  by  W^  A.  Wdldhack,  J.  Stern, 
and  J.  Smith,  April  iS54. 


Taking  the  higher  figiire,  we  obtain  the  total  of  IBO,  000 
postings  for  18,000  documents  or  5,000  Batten  cards. 

This  means  an  average  utilization  of  180,  000  or  36  holes 

5,000 

on  each  card.  Each  Uniternri  card  (front  and  back)  for  a 
manual  system  has  a capacity  of  520  numbers  if  all  col- 
umns are  equally  utilized.  In  actual  practice,  all  col- 
umns will  not  be  equal,  but  the  percentage  c£  difference 
between  the  longest  column  and  the  shortest  column  will 
decrease  as  ca.i  «Js  fill  up  (Fxiiibit  II  rcakes  this  clear). 

It  is,  therefore,  conservative  to  assume  that  we  can 
get  400  numbers  on  a Uniterm  card.  With  the  same 
size  vocabulary,  5,  000,  and  me  same  number  of  post- 
ings per  document,  our  5,  000  cards  v.-iil  provide  for 
.2,  non,  000  postings  or  200,  000  ebeuments.  In  a Batten 
system  having  13,000  holes  we  vould  require  200,  000 

18,000 

or  10.  2 sets  of  5,  000  cards  for  200,  000  documents; 
that  is  to  say,  11  sets,  since  mere  can  be  no  partial 
sets  in  a Batten  system.  Further,  the  advantage  of 
rapid  coordination  in  a single  set,  be  lust 

through  the  necessity  of  selecting  cards  out  of  il 
sets  and  of  performing  11  separate  coordinating 
operations. 

It  is  obvious  that  one  can  cut  down  the  numlier  of 
sets  in  a Batten  system  by  increasing  tlie  size  of  the 

card  and  the  number  of  dedicated  porinons  on  each  card, 
but  such  an  increase  in  sizedees  not  increase  the  rela  - 
tive efficiency  of  the  system  since  the  percentage  of  dedi- 
cated positions  used  will  remain  constant. 

In  spite  of  the  above  considerations,  we  never  lost 
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sight  of  the  manifest  advantages  of  geometric  rnordir;a- 
lion  as  compar;:d  with  arithmetic  ^oordinaticn,  espec- 
ially with  respect  m systems  large  eiiougt\  to  make  mech- 
arazation  feasible  and  desirable.  For  qaite  early  in  our 
program  it  seemed  clear  that  the  mechanization  of  arith- 
metic coordination  (what  is  usually  called  collation(^) ) is 
not  a promising  technique  in  information  storage  and  re- 
trieval. The  machine  comparison  ctf  two  or  mure  sets  of 
numbers  in  order  to  find  any  numix-rs  common  to  the 
sets,  must  in  princip'r,  be  a sequential  process,  whereas 
the  machine  comparison  of  dedicatedpositions  in  two  or 
more  Batten  cards  can  ue  reiativeAy  insLariianeuus. 

As  was  said  above,  one  apparent  solution  to  the 
limitations  of  the  Batten  system  involves  increasing  the 
size  of  the  card;  and  if  the  card  is  to  be  positioned  and 
scanned  mechanically,  only  the  size  of  available  matcr- 
lais,  the  size  of  holes,  a.id  the  problems  of  mechanical 
positioning  affect  the  decision  on  liiis  matter.  In  a scr- 
ies of  tests  we  ascertained  that  a 3’  by  3*  metal  or  plas- 
tic sheet  between  0.15”  and  0.20"  thick,  which  is  not 
too  big  or  too  heavy  for  mechanicai  handling,  will  pro- 
vide dedicated  yositions  1/16'  x 1/16'  for  250,  000  items. 
Having  made  tluF  determination,  wc  were  still  faced 
■wdth  three  problems: 

1.  No,  of  sheets; 


5,  000  3"  X 5"  or  5”  y 8”  cards  is  a coi.aparati''.’e- 
ly  small  file;  but  5,000  3’  x 3'  metal  sheets  each 


'Machines  and  Classification  in  the  Organization  of 
Information",  Technical  Report  No.  2,  Documentation 
Incorporated,  December  1953,  p.  10  to  27 


of  which  is  .020’'  thick,  represent  c,  ci±>s  of 
metal  3'  x 3'  x .3,  3’.  This  seemed  to  carry  us 
beyond  the  range  of  practicality. 


2.  Addition  of  New  Sheets: 

Even  though  thT*  num'oer  of  Uniterms  in  any  evstem 
remains  fairly  constant  and  new  Uniterms  are  prid- 
ed infrequently,  we  must  provide  for  such  addi- 
tions. This  means  that  in  addition  to  our  initial 
cube  of  metal  3'  x 3'  x B.  3’,  we  must  have  a stock 
of  extra  3'  x 3'  sheets  fr.r  cf>dintr  and  .addin:  to 


our  basic  set  within  tlie  machine.  This,  too,  we 
determined  was  impractical. 

3.  Use  of  Dedicated  Area; 

The  recognition  that  our  total  ending  space  is  a 
cube  i.'fings  with  it  cendusive  evidence  that  the 
Batten  system,  regardless  of  the  number  of  posi- 
tions on  a card,  is  an  inefficieiit  mechanism  for 
storing  intormation.  Of  the  metal  cube, 0.2%  would 
be  used  for  storing  information,  the  remainder 
would  be  potentially  useful  bvt  actually  unused 

enq  Cf* 

li  vve  neglect,  for  a moTneni  the  problem  of  adding 
sheets^  we  can  see  that  the  problem  of  the  nunber  of 
sheets  and  the  problem  of  the  percentage  of  space  used 
on  each  sheet  arc  two  sides  of  the  same  coin.  For  the 
same  number  of  bits  of  information  a liigher  density  of 


use  of  each  sheet  would  involve  fewer  sheets.  The  solu- 
tion to  this  problem  wkich  is  expressible  in  the  sn.me 
type  of  analysis  as  that  which  led  to  the  co.iccpt  of  Urd- 
term,  was  first  developed  and  patented  by  Dr.  Frederick 
.Tonker,  who,  besides  being  a member  of  the  staff  of 


L>ocumentation  Incorporated,  has  orgamzed  hjiS  own 
business  machine  company.  Dr.  Jonker  recognized  that 
just  as  the  ideas  in  any  system  of  mforuiaLiua  could  be 
expressed  as  logic,-.!  products  of  terms,  so  any  term 
Can  be  expressed  as  a logical  product  of  simpier  ele- 
ments. Furtlier,  just  as  the  number  of  ierxirs  in  any 
system  is  much  less  than  the  number  of  ideas,  so  the 
number  of  letters,  as  is  obvious,  is  much  less  than 


the  number  of  terms, 

It  will  be  said  that  a word  is  not  a simple  Icigical 
product  of  letters,  but  a produrS  of  letters  in  a ccr- 

^ 4.^^  «.u  ^ ^ — - 
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nor  is  dog  the  same  word  as  Grid , Sut  suppose,  instead 
of  one  alphabet  of  2G  letters,  we  have  an  alpliabet  of  three 
times  this  number  or  78.  Tn  this  artificial  alphabet  we 
will  have  three  ”A’s’--Al,  A2,  A3;  three  "’B's”--Bl,  B2, 
B3;  three  “C's” — Cl,  C2,  C3,  etc.  With  such  an  alphabet 
it  is  clear  we  can  express  txiy  three  letter  word,  without 
umbjguiijj  as  a iogical  product  of  letters.  For  exam- 
ple; T1  A2  H3;  R1  A2  T3;  Dl  02  G3;  GJ  02  D3.  If  we  ex- 
tend the  number  of  letters  in  our  alphabet  in  tliis  fashion, 
or,  wirii  the  same  thing,  extend  Uie  nuuibei  of  alphabet 
in  our  system,  we  can  e:qjress  as  logical  pro^iicts,  long- 
er and  longer  V'ords.  With  260  letters  or  10  alphabets, 
for  ejample,  we  can  express  every  word  up  to  10  letters 
long  not  only  in  the  English  language,  but  in  any  language 

Let  us  examine,  now,  the  manner  in  which  this  de-« 
velopment  solves  the  three  problems  we  have  presented 
above: 


1.  No.  of  Sheets.  Instead  of  5,000  cards  cr  sheets, 
that  is,  one  for  each  Uniterm,  wc  now  need 
only  25  times  the  number  of  alphabets.  For 
ease  of  multiplication  in  what  follows,  suppose 
we  assume  iO  alphabets  and  round  off  the  '=;ize 
of  system  at  300  sheets  (10  x 26  -(■  40  for  sym- 
bols, Greek  letters,  and  other  uevices). 

2.  Addition  of  New  Sheets.  It  .should  be  apparent, 

that  like  the  letters  in  the  alphabet  or  the  dig- 
its in  the  number  system,  o'jr  swlem  is  con- 


stant and  no  new  sheets  will  evtri  be  required 


-..4-  4 1*- 

kJuocLuot::  Ml  tiic:  ox  aa 


' i fcj  r»  fC 


to  the  system. 

3.  Use  of  Dedicated  Area.  On  ample  matiiem.at- 
ical  grounds  it  is  clear  that  t.he  same  amount 
of  information,  stored  in  a smaller  number 
of  sheets, must  operate  to  Irjcrease  the  der.sitv 
of  use  of  each  sheet.  The  exact  manner  in 


which  this  comes  about  can  be  seen  from  a 
comparison  with  the  manner  in  which  a man- 
ual Uniterm  system  stores  infoi  matron,  whirh 
is  dispersed  in  a standard  library  system. 


Suppose  that  in  a standard  indexing,  system  there 
are  three  headings: 

air  ducts 
air  blot;-ers 
air  cleaners 

In  the  Uniterm  Sj'stem  ail  items  on  air  ducts,  all 
itenis  oil  air  all  ileiTis  on  air  cleaners,  ar 

entered  on  the  Air  Uniterm  card. 
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Simil:ir!v  in  the  letter  system,  all  items  on  accur;ic.y, 
airpinne.  acid,  accelerometers,  etc.,  Vvd  11  ue  entered  on 
the  Ai  card  or  sheet;  all  items  on  aeid,  acrurary 
accelerometers,  etc.,  will  be  entered  on  the  C2  sheet; 
accuracy  and  acceleromctc^  will  be  entered  on  the  C3 
sheet;  airplane  and  accelerometer  will  be  entered  on  the 


L5  sheet,  etc. 

It  will  be  recalled  that  with  the  Batten  system,  with 
a vocabulary  of  5,  000  terms  with  each  item  indexed  by 
10  terms,  the  average  use  of  dedicated  no.sitions  was 0.2 


of  1 percent.  For  a svstem  of  10,000  items  or  18,000 
dedicated  positions, 0.  2 percent  would  be  36  per  sheet. 
For  our  3'  x 3'  sheets  with  dedicated  positions  for 
250,  000  items,  0.  2 nercent  would  be  500. 

Vvith  the  letter  system,  ii  we  assume  an  equal  or 
random  use  of  all  letters,  and  the  same  depth  of  in- 
dexing, we  would  use  one-third  of  the  dedicated  positions 
on  each  sheet.  There  are  three  ways  in  which  we 
arrive  at  this  figure; 


1.  For  the  same  number  of  items  of  intormati  c 
and  the  same  depth  of  indexing,  the  density  of  post- 
ings varies  inversely  as  the  number  of  sheets,  thus 
5,  000  sheets  = x postings  = 25,  000.  Multiplying  this 
•)U0  sheets  oGO  postings  3 


figure  by  10,  to  provide  for  iO  postings  per  term  in  the 
letiei  system,  we  have250,  000=  33  1/3  of  the  possible 

3 


positions  posted  on  each  sheet. 

2.  If  there  are  300  sheets  in  the  system  and  each 
sheet  contains  250,  000  positions,  we  have  a total  num- 
ber of  75,  000,  000  storage  positions,  in  the  i.yztcm. 
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If  each  item  is  inJexe'i  bv  ten  terms.  er<cti  being  JO 

this  would  involve  using  100  holes  per  item. 
Thus,  the  indeXang  of  250,  000  items  will  utilize  100  x 
250.000  or  25,  000,  nfiO  pcsitions,  which  again  is  one- 
third  of  the  possible  positions  in  the  total  system. 

3.  Suppose  our  300  cards  represent  10  alphabets  oi 
30  letters.  Then  any  document  indexed  by  10  terms  will 
utilize  under  random  distribution,  one-third  of  the 
letters  in  each  alphabet  for  cacl;  document  or  dedicated 
position 

Hence,  for  250,  OGG  documen's  one-trurti  of  ?lie  nnei- 
tions  ■'A'ill  he  used  or.  all  the  sheets 

It  should  now  be  clear  that  the  use  of  language  ele- 
ments rather  than  Unilerms  solves  the  three  problems 
of  raechanization,  namely,  the  size  of  the  file,  the  uer- 
maneiice  of  the  file  and  the  efficient  "se  of  coding  arcp 
or  dedicated  positions.  We  have,  as  wt,  sluiil  sec  f.‘eiuw, 
introduced  a new  problerr.  which  is  not  present  in  the 
Uniterm  System  or  the  Batten  system. 

The  logical  product  achieved  in  a Uniterm  or  Batten 
system  may  be  ambiguous.  Thus,  if  we  combine  fish 
and  foo^,  we  will  get  information  cn  food  for  fishes  and 
fish  as  a food.  This  tsme  of  ambiguity  can  ussoally  be 
resolved  by  adding  another  term  as  fcllavs;  fish-foud- 
vitamin  or  fish-food-nlankton.  "Noise”  or  iriformation 
which  has  no  relevence  to  the  terms  coordinated  can 
never  .appear  in  the  logical  product  presented  by  a 
Uniterm  or  Batten  sysiem.  But  when  we  go  from  Uni- 
terms to  language  elements,  wc,  in  effect,  go  from 
a system  of  direct  coding  to  superimposed  coding  and 
this  introduces  the  problem  of  "noise"  or  laise  drops 


in  the  inJorination  system. 

Wp  have  ^fieriively  solved  this  oroblem  in  out  mach- 
ine deveiopmeni;  so  that  noise  or  irrelevant  infoririation 
delivered  by  the  m.achine  in  answering  any  question,  al- 
though possible  in  principle,  is  for  all  practical  purposes 
eliminated.  The  exact  nature  of  our  solution  involves 
certain  nnithematical  and  mecnanical  considerativus 
which  we  will  not  present  at  this  time.  But  v/e  can 
illustrate  the  problem  as  it  occurs  in  a .simpler  sys- 
tem. the  w€'ll  known  '/atocoding  system,  developed  by 
Calvin  Mooer s . 

In  any  indexing  system  we  must  maVe  a uisLinctjon 
between  the  number  of  headings  or  entries  by  whidi  an 
item  is  indexed  and  the  number  of  headings  used  either 
singly  or  in  combination  to  retrieve  any  item  of  iiuur- 
mation.  In  a standard  subject- cataloging  system,  or  a 
system  of  indexing  like  the  Index  to  Chemical  Abstracts, 
we  may  enter  items  under  a number  of  different  head- 
ings. But  we  always  search  for  information  under  one 
heading  at  a time.  If  the  material  we  i etrieve  under  any 
heading  is  not  suxficient  or  adequate  to  our  purposes,  we 
may  try  another  heading,  which  may  be  broader,  narrow- 
er, or  just  different  from  the  heading  first  used.  But 
in  no  sense  is  suchia  search  a search  for  materi.al  in- 
dexed under  a combin.ation  of  headings.  Zatocoding  is 
a species  of  coordinate  indexing.  This  means  that  in 
Zatocoding  itom.s  are  indexed  under  a group  of  headings 
and  searched  for  undnr  any  heading  or  any  combination 
of  headings.  But  the  fact  tliat  an  item  is  indexed  under 
10  terms  or  descriptors  does  not  mean  that  questions 
put  to  the  system  should  or  do  coiisist  of  10  terms  or 
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descriptors. 

ternij  ^nu  xi 


In  fact  a qufstion  may  be  asked  a.nder  one 
ie  our  experieiicc  that  -Jic  preat  buii\  of 


questions  involve  2 or  3 terms  and  that  auditionaJ 
terms  are  added  only  if  the  amount  of  information  de- 


livered by  a 2 or  3 termed  question  proves  to  be  too 
voluminous.  In  other  words,  actual  reference  exper- 
ience and  not  mathematical  considerations  alone  muEt 
influence  the  design  of  a system.  A system  of  coor- 
dinate indexing  which  delivers  loo  liigi),  a percentage  of 
*'noise”  for  one,  tv/o  or  three  term  questions  is  simply 
not  adequate  for  reference  use. 

Ill  the  Zatocoding  system  random  numbers  corres- 
pond to  the  language  elements  in  the  indexing  system 
we  have  described.  Every  term  or  subject  is  repre- 
sented by  a random  group  of  four  numbers  chosen  from 
a field  of  forty  numbers,  e.  g. ; 


3 

12 

> ^ 
1 ; 

23 

ducts 

6 

IV 

24 

29 

icing 

11 

12 

23 

27 

We  assume  that  for  any  item  indexed,  no  more  than 
1/2  of  the  coding  area  is  used.  (This  figure  corresponds 
to  our  uRo  of  1/3  of  the  dedicated  positions  in  our  mach- 
ine development. ) 

The  formula  (approx. ) given  by  CaUdn  Mooers  for 
the  dropping  fracticr.  in  any  search  is  1/2^  where  ii 
equals  the  number  of  nolciies  or  holes  used  in  a search. 
For  example,  if  v/e  search  a collection  for  everything 
Air,  n - 4 (3, 12, 17,  23);  ior  everyiiiing  on  Air  Ducts, 
Hr?  (3,12,17,23,6,  24,  29):  for  everything  on  Air  Pacts 
Icing,  n (3, 12,  i?,  23,  6,  24,  2u,  U,  27). 
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j n 4 

T" 

G I"  7 

8 

1 
1 

1 

'Tl 

. 

1 1/2”|!/1B 

L i - - 

1/32 
^ — 

l/GT  1/123 

1/20G 

I/‘>12|l/!024 

i/2043 

1/40(;G 

It  is  only  nocciisary  io  :auUiiily  the  clrojJi)iiiL!;  fr;i-  lion  hy 
the  size  of  any  colloction  to  determine  tin;  .i-iproximato 


;(Vei\i{-,e  tmiriher  of  false  drops  per  seareh.  This  tabic 
the  size  of  the  dropping  fraction  for  v^arious  sizes  of  n. 


sho'v 


SIZE  OF  COLLEC'n 

ON 

j i 

Dropping 
1‘  raction 

1, 000 

1 

10,  000 

100,000 

250, 000 

•1 

I/IG 

62,  5 

625.  0 

6,  250 

15,  625 

5 

1/32 

31.25 

312.  5 

3,  125 

7,  313 

6 

1/64 

i 5.  63 

136.  3 

1, 563 

nno 

Of  KJ\J 

7 

1/123 

7 H2 

78.  2 

782 

1, 955 

O 

1/256 

3.91 

39.  1 

391 

978 

9 

i/312 

1. 96 

19.  8 

1 96 

490 

10 

1/1024 

0.  93 

9.  8 

98 

245 

11 

1/2048 

0.  40 

4.  9 

40 

rv  r> 

1 y.o 

12 

1/4086 

0.  25 

rv  »• 

z.  a 

25 

63 

If,  as  is  the  case,  the  average  reference  question 
involves  the  coordination  of  Iwu  to  three  terms,  the 
average  number  of  false  drops  in  a /jatocoding  system 
containing  100,000  items  will  range  from  3,  125  items 
(when  n ^ 5.  the  minimuiii  number  of  notv^heo  used  foi 
2 terms;  to  25  items  (when  n = 12,  the  maximum  number 
of  notches  which  can  be  used  for  3 terms). 

In  the  mechanized  system  we  are  developing,  for  a 
searcli  by  2 terms,  we  are  planning  a ch  opping  fraction 
of  1/320.  This  means  that  for  all  imacUcal  purposes  there 
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will  be  no  false  drops  or  noise  in  a system  of  2^)0,  000 
items. 

\5’e  present  in  the  following,  a summary  statement 
of  the  characteristics  of  the  machine  development  which 
are  presumed  to  he  of  general  interest: 

Nature  of  the  Development.’  The  proposed  machine  is 
based  on  proven  and  tested  pr  inciples  and  does  not 
contain  any  speculative  elements. 

Operation  of  the  Machine:  In  entering  iiiforniaUo.i  - 

merely  typing  out  the  indexing  terms  or  piu  ases  on 
a keyboard;  in  searching  infoi  mation  - merely 
typing  out  the  search  words  or  phrases  on  this  samt- 
keyboard.  The  document  number  of  the  relevent 
dncuments’^  will  he  printed  on  a paper  tape. 

Proposed  Do emnent*  St.o rage  Capacity  Pe  r ivlachinc: 

250,  000  documents  per  unit.  However,  units  of 
double  this  capacity  arc  feasible. 
iritimate  Total  Document*  Storage  Capacity:  Unlimited. 

A number  of  machines  can  be  operated  simultaneous- 
ly from  a single  control  console. 

Information  Capacity  Per  Document*:  .^out  10  descriptive- 
words  or  phrases  per  document  is  at  present  contem- 
plated. This  is  equivalent  to  approximately  2 x 10*^ 
digital  units  of  infcrma.tiou.  This  number  can  be  in  - 
creased if  nec'^'^sarv. 


♦’'Document"  should  be  ujiderstood  as  ati  article  or  item 
on  which  information  is  t-n  he  utorsd.  "Documents"  can 
be  books,  reports,  medical  case  records,  persons, 
criminals,  patents,  court  decisions,  articles  of  a 
supply  catalog,  etc. 
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poten  V .nlinii’ed  capacity  of  worri^  ' 
etc.,  iji  • lan-^vuigc  and  using  anv  type  of  • | 
such  a."  bic  end  Roman  nnmerais,  Greel'. 


etc. 

Codmg  Requi.  ‘ naents:  None;  actual  language  el 
used  as  i.  rone  <.*ode.  No  code  convorsic 
Storage  ...  . : c vT.j-,«r  cO  r :■  Ji  'ic 

fym.t  information  cn  a keyboard.  Followi 
storage  ;s  virtu;illy  instantaneous. 
Se^trching  Speed:  Following  ty'ping  of  search 

matter  of  a few  seconds  per  document  yie 
Mature  of  Storage  Eleirmnts:  Large  metal  «he< 

most  permanent,  durable  type  of  storage. 

•V  a chine  Dimensions:  Approximately  7’  x 4'  x 
5S  .about  1^20  of  volumetric  space  occupied 
valent  conventional  cadaiog  card  drawers. 

/;  vemblcd  Weight:  I,  000  lbs.  giving  ipproxim 
1^0  sbs.  per  sq.  ft.  floorioading. 

L tire  of  Techniques  Used:  Entirely  mechanic 
elf  ctrome Cham cai.  No  electronic  element 
lio  reliability, 

Rt  Ttiduction  anu  ^ fi’.o.  ed  T- L t 

^ttored  information  can  be  repr».  uuccu  ^ic 
economically  by  , puoi: -.'L't'i . ' . -• 

Cos:  ci  the  Machine  .'ten  Man  factured  in  S, 

’ cr  1,  000, 000  r uments  the  cost  of  the 
t i-  as  low  as  th«  ost  of  the  filing  cahine 
ir  .ex  cards  pk  die  index  cards,  requii 
V4  ntionai  card  taiog  system. 
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